Why Ignoring is Useful - Lab 4

Has there ever been a time where something has been constantly popping up, and you would like nothing more than have it just ripped away from your sight so that you can live in blissful ignorance to its existence? Well, have I got something for you!

While this new feature that I have implemented into AbdulMAbdis link checker tool, "deadOrNot", is useful for leaving a specified link off of a long list of URLs, unfortunately, it can't solve ALL of your problems. But, if you are seeking out a tool to weed out a group of similar links, then this is a match made in heaven!

Ok, back to reality. Setting up this new feature was a fun way for me to look at another approach to using Python for a similar task, while learning new ways to do the exact same thing. For using argparser vs click, to setting up threads for more streamlined processing, I made sure to take notes so that I could introduce certain new features to my own project in the future.

As for the set-up process of my new addition, it took a few new classes and functions and if structures (oh my!) to get to where I needed to go, and working through the program flow took some time to wrap my head around so I knew where my new lines of code would be desperately needed.

The first step that I had to do was set up a class to process the new text file that held the link to ignore, along with potential comments. For the most part, I was able to mirror what he already had set up to process his own text file of links, so all I have to fit in was a few adjustments to the regular expression to fit my needs, as well as a quick check to ensure that there were links to process in the file. On top of that, it ended up being necessary to have regex for commented lines, as well as what an invalid URL would look like. To accomplish these tasks, I learned a new useful regex rule called a "negative lookahead assertion", which I found helpful in assuring that a link (whether valid or invalid) wasn't preceded by specific criteria. My new regular expressions ended up looking like this:

                     self.fileComments = set(re.findall('#.*', self.fileText))

                    self.fileUrl = set(re.findall('(?!# )(http|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?                                   ^=%&:/~+#-]*[\w@?^=%&/~+#-])?', self.fileText))

                    self.fileInvalidUrl = set(re.findall('(?!# )(?!http|https)(?!://)([\w_-]+(?:(?:\.[\w_-]+)+))                               ([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?', self.fileText))

Make note of the portions that follow this pattern: (?!      ). Anything within the parentheses and following the exclamation mark will be ignored (the makeup of a negative lookahead assertion).

Once that was complete, all that needed to be done was create a new pathway for if an ignore text file was included in the argument list with the prefix "-i" or "--ignore". I needed to make sure that an ignore file with comments only, comments and a link, a link on its own, and an invalid link would be handled properly. From there, the program was essentially ready to handle the rest of the logic.

Unfortunately, at this moment, no other students have reached out to work on my repo to implement this feature yet, so the point of this lab has not yet been explored (utilizing 'git remote' and 'git fetch' for merges). I will follow up with a new blog post once this new way to do things with git has been fully explored.

Thanks again for reading, and stay tuned for more of my adventures in Open Source!

Comments