Python 3.9+ is necessary to run this library. Poetry is used for packaging and dependency management.
The API comprises a main class FAAPI
, two submission classes Submission
and SubmissionPartial
, a journal
class Journal
, and a user class User
.
Once FAAPI
is initialized, its methods can be used to crawl FA and return parsed objects.
from requests.cookies import RequestsCookieJar
import faapi
import orjson
cookies = RequestsCookieJar()
cookies.set("a", "38565475-3421-3f21-7f63-3d341339737")
cookies.set("b", "356f5962-5a60-0922-1c11-65003b70308")
api = faapi.FAAPI(cookies)
sub, sub_file = api.submission(12345678, get_file=True)
print(sub.id, sub.title, sub.author, f"{len(sub_file) / 1024:02f}KiB")
with open(f"{sub.id}.json", "wb") as f:
f.write(orjson.dumps(dict(sub)))
with open(sub.file_url.split("/")[-1], "wb") as f:
f.write(sub_file)
gallery, _ = api.gallery("user_name", 1)
with open("user_name-gallery.json", "wb") as f:
f.write(orjson.dumps(list(map(dict, gallery))))
At init, the FAAPI
object downloads the robots.txt file from FA to determine
the Crawl-delay
and disallow
values set therein. If not set in the robots.txt file, a crawl delay value of 1 second
is used.
To respect this value, the default behaviour of the FAAPI
object is to wait when a get request is made if the last
request was performed more recently then the crawl delay value.
See under FAAPI for more details on this behaviour.
Furthermore, any get operation that points to a disallowed path from robots.txt will raise an exception. This check should not be circumvented, and the developer of this library does not take responsibility for violations of the TOS of Fur Affinity.
To access protected pages, cookies from an active session are needed. These cookies can be given to the FAAPI object as
a list of dictionaries - each containing a name
and a value
field -, or as a http.cookiejar.CookieJar
object (requests.cookies.RequestsCookieJar
and other objects inheriting from CookieJar
are also supported). The
cookies list should look like the following example:
cookies = [
{"name": "a", "value": "38565475-3421-3f21-7f63-3d3413397537"},
{"name": "b", "value": "356f5962-5a60-0922-1c11-65003b703038"},
]
from requests.cookies import RequestsCookieJar
cookies = RequestsCookieJar()
cookies.set("a", "38565475-3421-3f21-7f63-3d3413397537")
cookies.set("b", "356f5962-5a60-0922-1c11-65003b703038")
To access session cookies, consult the manual of the browser used to log in.
Note: it is important to not logout of the session the cookies belong to, otherwise they will no longer work.
Note: as of April 2022 only cookies a
and b
are needed.
FAAPI
attaches a User-Agent
header to every request. The user agent string is generated at startup in the following
format: faapi/{library version} Python/{python version} {system name}/{system release}
.
This is the main object that handles all the calls to scrape pages and get submissions.
It holds 6 different fields:
session: CloudflareScraper
cfscrape
session used for get requestsrobots: urllib.robotparser.RobotFileParser
robots.txt handleruser_agent: str
user agent used by the session (property, cannot be set)crawl_delay: float
crawl delay from robots.txt (property, cannot be set)last_get: float
time of last get (UNIX time)raise_for_unauthorized: bool = True
if set toTrue
, raises an exception if a request is made and the resulting page is not from a login sessiontimeout: int | None = None
requests timeout in seconds for both page requests (e.g. submissions) and files
__init__(cookies: list[dict[str, str]] | CookieJar)
The class init has a single argument for the cookies. Cookies must be in the format mentioned above in #Cookies.
load_cookies(cookies: list[dict[str, str]] | CookieJar)
Load new cookies and create a new session.
Note: This method removes any cookies currently in use, to update/add single cookies access them from the session object.handle_delay()
Handles the crawl delay as set in the robots.txtcheck_path(path: str, *, raise_for_disallowed: bool = False) -> bool
Checks whether a given path is allowed by the robots.txt. Ifraise_for_disallowed
is set toTrue
aDisallowedPath
exception is raised on non-allowed paths.connection_status -> bool
Returns the status of the connection.login_status -> bool
Returns the login status.get(path: str, **params) -> requests.Response
This returns a response object containing the result of the get operation on the given URL with the optional**params
added to it (url provided is considered as path from 'https://www.furaffinity.net/').get_parsed(path: str, *, skip_page_check: bool = False, skip_auth_check: bool = False, **params) -> bs4.BeautifulSoup
Similar toget()
but returns the parsed HTML from the normal get operation. If the GET request encountered an error, anHTTPError
exception is raised. Ifskip_page_check
is set toTrue
, the parsed page is not checked for errors ( e.g. non-existing submission). Ifskip_auth_check
is set toTrue
, the page is not checked for login status.me() -> User | None
Returns the logged-in user as aUser
object if the cookies are from a login session.frontpage() -> list[SubmissionPartial]
Fetch the latest submissions from Fur Affinity's front page.submission(submission_id: int, get_file: bool = False, *, chunk_size: int = None) -> tuple[Submission, bytes | None]
Given a submission ID, it returns aSubmission
object containing the various metadata of the submission itself and abytes
object with the submission file ifget_file
is passed asTrue
. The optionalchunk_size
argument is used for the request; if left toNone
or set to 0 the download is performed directly without streaming.
Note: the authorUserPartial
object of the submission does not contain thejoin_date
field as it does not appear on submission pages.submission_file(submission: Submission, *, chunk_size: int = None) -> bytes
Given a submission object, it downloads its file and returns it as abytes
object. The optionalchunk_size
argument is used for the request; if left toNone
or set to 0 the download is performed directly without streaming.journal(journal_id: int) -> Journal
Given a journal ID, it returns aJournal
object containing the various metadata of the journal.user(user: str) -> User
Given a username, it returns aUser
object containing information regarding the user.gallery(user: str, page: int = 1) -> tuple[list[SubmissionPartial], int | None]
Returns the list of submissions found on a specific gallery page, and the number of the next page. The returned page number is set toNone
if it is the last page.scraps(user: str, page: int = 1) -> -> tuple[list[SubmissionPartial], int | None]
Returns the list of submissions found on a specific scraps page, and the number of the next page. The returned page number is set toNone
if it is the last page.favorites(user: str, page: str = "") -> tuple[list[SubmissionPartial], str | None]
Downloads a user's favorites page. Because of how favorites pages work on FA, thepage
argument (and the one returned) are strings. If the favorites page is the last then aNone
is returned as next page. An empty page value as argument is equivalent to page 1.
Note: favorites page "numbers" do not follow any scheme and are only generated server-side.journals(user: str, page: int = 1) -> -> tuple[list[JournalPartial], int | None]
Returns the list of submissions found on a specific journals page, and the number of the next page. The returned page number is set toNone
if it is the last page.watchlist_to(self, user: str, page:int = 1) -> tuple[list[UserPartial], int | None]
Given a username, returns a list ofUserPartial
objects for each user that is watching the given user and the next page, if it is not the last, in which case aNone
is returned.watchlist_by(self, user: str, page:int = 1) -> tuple[list[UserPartial], int | None]
Given a username, returns a list ofUserPartial
objects for each user that is watched by the given user and the next page, if it is not the last, in which case aNone
is returned.
Note: The last page returned by the watchlist_to
and watchlist_by
may not be correct as Fur Affinity doesn't seem
to have a consistent behaviour when rendering the next page button, as such it is safer to use an external algorithm to
check whether the method is advancing the page but returning the same/no users.
A stripped-down class that holds basic user information. It is used to hold metadata gathered when parsing a submission, journal, gallery, scraps, etc.
name: str
display name with capital letters and extra characters such as "_"status: str
user status (~, !, etc.)title: str
the user title as it appears on their userpagejoin_date: datetime
the date the user joined (defaults to timestamp 0)user_tag: bs4.element.Tag
the user element used to parse information (placeholder,UserPartial
is filled externally)
UserPartial
objects can be directly cast to a dict object and iterated through.
Comparison with UserPartial
can be made with either another UserPartial
or User
object (the URL names are
compared), or a string (the URL name is compared to the given string).
__init__(user_tag: bs4.element.Tag = None)
To initialise the object, an optional bs4.element.Tag
object is needed containing the user element from a user page or
user folder.
If no user_tag
is passed then the object fields will remain at their default - empty - value.
name_url -> str
Property method that returns the URL-safe usernameurl -> str
Property method that returns the Fur Affinity URL to the user (https://www.furaffinity.net/user/{name_url}
).generate_user_icon_url() -> str
Generates the URl for the current user icon.parse(user_page: bs4.BeautifulSoup = None)
Parses the stored user page for metadata. Ifuser_page
is passed, it overwrites the existinguser_page
value.
The main class storing all of a user's metadata.
name: str
display name with capital letters and extra characters such as "_"status: str
user status (~, !, etc.)title: str
the user title as it appears on their userpagejoin_date: datetime
the date the user joined (defaults to timestamp 0)profile: str
profile text in HTML formatprofile_bbcode: str
profile text in BBCode formatstats: UserStats
user statistics sorted in anamedtuple
(views
,submissions
,favorites
,comments_earned
,comments_made
,journals
,watched_by
,watching
)info: dict[str, str]
profile information (e.g. "Accepting Trades", "Accepting Commissions", "Character Species", etc.)contacts: dict[str, str]
contact links (e.g. Twitter, Steam, etc.)user_icon_url: str
the URL to the user iconwatched: bool
True
if the user is watched,False
otherwisewatched_toggle_link: str | None
The link to toggle the watch status (/watch/
or/unwatch/
type link)blocked: bool
True
if the user is blocked,False
otherwiseblocked_toggle_link: str | None
The link to toggle the block status (/block/
or/unblock/
type link)user_page: bs4.BeautifulSoup
the user page used to parse the object fields
User
objects can be directly cast to a dict object and iterated through.
Comparison with User
can be made with either another User
or UserPartial
object (the URL names are compared), or a
string (the URL name is compared to the given string).
__init__(user_page: bs4.BeautifulSoup = None)
To initialise the object, an optional bs4.BeautifulSoup
object is needed containing the parsed HTML of a submission
page.
If no user_page
is passed then the object fields will remain at their default - empty - value.
name_url -> str
Property method that returns the URL-safe usernameurl -> str
Property method that returns the Fur Affinity URL to the user (https://www.furaffinity.net/user/{name_url}
).generate_user_icon_url() -> str
Generates the URl for the current user icon.parse(user_page: bs4.BeautifulSoup = None)
Parses the stored user page for metadata. Ifuser_page
is passed, it overwrites the existinguser_page
value.
This object contains partial information gathered when parsing a journals folder. It contains the following fields:
id: int
journal IDtitle: str
journal titledate: datetime
upload date as adatetime
object (defaults to timestamp 0)author: UserPartial
journal author (filled only if the journal is parsed from abs4.BeautifulSoup
page)stats: JournalStats
journal statistics stored in a named tuple (comments
(count))content: str
journal content in HTML formatcontent_bbcode: str
journal content in BBCode formatmentions: list[str]
the users mentioned in the content (if they were mentioned as links, e.g.:iconusername:
,@username
, etc.)journal_tag: bs4.element.Tag
the journal tag used to parse the object fields
JournalPartial
objects can be directly cast to a dict object or iterated through.
Comparison with JournalPartial
can be made with either another JournalPartial
or Journal
object (the IDs are
compared), or an integer (the JournalPartial.id
value is compared to the given integer).
__init__(journal_tag: bs4.element.Tag = None)
Journal
takes one optional parameters: a journal section tag from a journals page.
If no journal_tag
is passed then the object fields will remain at their default - empty - value.
url -> str
Property method that returns the Fur Affinity URL to the journal (https://www.furaffinity.net/journal/{id}
).parse(journal_item: bs4.element.Tag = None)
Parses the stored journal tag for information. Ifjournal_tag
is passed, it overwrites the existingjournal_tag
value.
This object contains full information gathered when parsing a journal page. It contains the same fields
as JournalPartial
with the addition of comments:
id: int
journal IDtitle: str
journal titledate: datetime
upload date as adatetime
object (defaults to timestamp 0)author: UserPartial
journal author (filled only if the journal is parsed from abs4.BeautifulSoup
page)stats: JournalStats
journal statistics stored in a named tuple (comments
(count))content: str
journal content in HTML formatcontent_bbcode: str
journal content in BBCode formatheader: str
journal header in HTML format (if present)footer: str
journal footer in HTML format (if present)mentions: list[str]
the users mentioned in the content (if they were mentioned as links, e.g.:iconusername:
,@username
, etc.)comments: list[Comments]
the comments to the journal, organised in a tree structurejournal_page: bs4.BeautifulSoup
the journal page used to parse the object fields
Journal
objects can be directly cast to a dict object or iterated through.
Comparison with Journal
can be made with either another Journal
or JournalPartial
object (the IDs are compared),
or an integer (the Journal.id
value is compared to the given integer).
__init__(journal_page: bs4.BeautifulSoup = None)
Journal
takes one optional journal page argument.
If no journal_page
is passed then the object fields will remain at their default - empty - value.
url -> str
Property method that returns the Fur Affinity URL to the journal (https://www.furaffinity.net/journal/{id}
).parse(journal_page: bs4.BeautifulSoup = None)
Parses the stored journal tag for information. Ifjournal_tag
is passed, it overwrites the existingjournal_tag
value.
This lightweight submission object is used to contain the information gathered when parsing gallery, scraps, and favorites pages. It contains only the following fields:
id: int
submission IDtitle: str
submission titleauthor: UserPartial
submission author (only thename
field is filled)rating: str
submission rating [general, mature, adult]type: str
submission type [text, image, etc...]thumbnail_url: str
the URL to the submission thumbnailsubmission_figure: bs4.element.Tag
the figure tag used to parse the object fields
SubmissionPartial
objects can be directly cast to a dict object or iterated through.
Comparison with Submission
can be made with either another SubmissionPartial
or Submission
object (the IDs are
compared), or an integer (the Submission.id
value is compared to the given integer).
__init__(submission_figure: bs4.element.Tag = None)
To initialise the object, an optional bs4.element.Tag
object is needed containing the parsed HTML of a submission
figure tag.
If no submission_figure
is passed then the object fields will remain at their default - empty - value.
url -> str
Property method that returns the Fur Affinity URL to the submission (https://www.furaffinity.net/view/{id}
).parse(submission_figure: bs4.element.Tag = None)
Parses the stored submission figure tag for information. Ifsubmission_figure
is passed, it overwrites the existingsubmission_figure
value.
The main class that parses and holds submission metadata.
id: int
submission IDtitle: str
submission titleauthor: UserPartial
submission author (only thename
,title
, anduser_icon_url
fields are filled)date: datetime
upload date as adatetime
object (defaults to timestamp 0)tags: list[str]
tags listcategory: str
categoryspecies: str
speciesgender: str
genderrating: str
ratingstats: SubmissionStats
submission statistics stored in a named tuple (views
,comments
(count),favorites
)type: str
submission type (text, image, etc...)description: str
description in HTML formatdescription_bbcode: str
description in BBCode formatfooter: str
footer in HTML formatmentions: list[str]
the users mentioned in the description (if they were mentioned as links, e.g.:iconusername:
,@username
, etc.)folder: str
the submission folder (gallery or scraps)user_folders: list[SubmissionUserFolder]
user folders stored in a list of named tuples (name
,url
,group
( if any))file_url: str
the URL to the submission filethumbnail_url: str
the URL to the submission thumbnailprev: int
the ID of the previous submission (if any)next: int
the ID of the next submission (if any)favorite: bool
True
if the submission is a favorite,False
otherwisefavorite_toggle_link: str
the link to toggle the favorite status (/fav/
or/unfav/
type URL)comments: list[Comments]
the comments to the submission, organised in a tree structuresubmission_page: bs4.BeautifulSoup
the submission page used to parse the object fields
Submission
objects can be directly cast to a dict object and iterated through.
Comparison with Submission
can be made with either another Submission
or SubmissionPartial
object (the IDs are
compared), or an integer (the Submission.id
value is compared to the given integer).
__init__(submission_page: bs4.BeautifulSoup = None)
To initialise the object, an optional bs4.BeautifulSoup
object is needed containing the parsed HTML of a submission
page.
If no submission_page
is passed then the object fields will remain at their default - empty - value.
url -> str
Property method that returns the Fur Affinity URL to the submission (https://www.furaffinity.net/view/{id}
).parse(submission_page: bs4.BeautifulSoup = None)
Parses the stored submission page for metadata. Ifsubmission_page
is passed, it overwrites the existingsubmission_page
value.
This object class contains comment metadata and is used to build a tree structure with the comments and their replies.
id: int
the comment IDauthor: UserPartial
the user who posted the commentdate: datetime
the date the comment was postedtext: str
the comment text in HTML formattext_bbcode: str
the comment text in BBCode formatreplies: list[Comment]
list of replies to the commentreply_to: Comment | int | None
the parent comment, if the comment is a reply. The variable type isint
only if the comment is parsed outside the parse method of aSubmission
orJournal
(e.g. by creating a new comment with a comment tag), and when iterating over the parent object (to avoid infinite recursion errors), be itSubmission
,Journal
or anotherComment
.edited: bool
True
if the comment was edited,False
otherwisehidden: bool
True
if the comment was hidden,False
otherwise (if the comment was hidden, the author and date fields will default to their empty values)parent: Submission | Journal | None
theSubmission
orJournal
object the comments are connected tocomment_tag: bs4.element.Tag
the comment tag used to parse the object fields
Comment
objects can be directly cast to a dict object and iterated through.
Comparison with Comment
can be made with either another comment (the IDs are compared), or an integer (
the Comment.id
value is compared to the given integer).
Note: The __iter__
method of Comment
objects automatically removes recursion. The parent
variable is set
to None
and reply_to
is set to the comment's ID.
Note: Because each comment contains the parent Submission
or Journal
object (which contains the comment itself)
and the replied comment object, some iterations may cause infinite recursion errors, for example when using
the copy.deepcopy
function. If such iterations are needed, simply set the parent
variable to None
and
the reply_to
variable to None
or the comment's ID (this can be done easily after flattening the comments list
with faapi.comment.flatten_comments
, the comments can then be sorted again with faapi.comment.sort_comments
which
will also restore the reply_to
values to Comment
objects).
__init__(self, tag: bs4.element.Tag = None, parent: Submission | Journal = None)
To initialise the object, an optional bs4.element.Tag
object is needed containing the comment tag as taken from a
submission/journal page.
The optional parent
argument sets the parent
variable described above.
If no tag
is passed then the object fields will remain at their default - empty - value.
url -> str
Property method that returns the Fur Affinity URL to the comment ( e.g.https://www.furaffinity.net/view/12345678#cid:1234567890
). If theparent
variable isNone
, the property returns an empty string.parse(tag: bs4.element.Tag = None)
Parses the stored tag for metadata. Iftag
is passed, it overwrites the existingtag
value.
These extra functions can be used to operate on a list of comments. They only alter the order and structure, but they do not touch any of the metadata.
faapi.comment.sort_comments(comments: list[Comment]) -> list[Comment]
Sorts a list of comments into a tree structure. Replies are overwritten.faapi.comment.flatten_comments(comments: list[Comment]) -> list[Comment]
Flattens a list of comments. Replies are not modified.
Using the tree structure generated by the library, it is trivial to build a graph visualisation of the comment tree using the DOT language.
submission, _ = api.submission(12345678)
comments = faapi.comment.flatten_comments(submission.comments)
with open("comments.dot", "w") as f:
f.write("digraph {\n")
for comment in [c for c in comments if c.reply_to is None]:
f.write(f" parent -> {comment.id}\n")
for comment in comments:
for reply in comment.replies:
f.write(f" {comment.id} -> {reply.id}\n")
f.write("}")
digraph {
parent -> 157990848
parent -> 157993838
parent -> 157997294
157990848 -> 158014077
158014077 -> 158014816
158014816 -> 158093180
158093180 -> 158097024
157993838 -> 157998464
157993838 -> 158014126
157997294 -> 158014135
158014135 -> 158014470
158014135 -> 158030074
158014470 -> 158093185
158030074 -> 158093199
}
The graph above was generated with quickchart.io
Using the BBCode fields allows to convert between the raw HTMl recovered from Fur Affinity and BBCode tags that follow FA's guidelines. Conversion from HTML to BBCode covers all known tags and preserves all newlines and spacing.
BBCode text can be converted to Fur Affinity's HTMl using the faapi.parse.bbcode_to_html()
function. The majority of
submissions can be converted back and forth between HTML and BBCode without any information loss, however, the parser
rules are still a work in progress and there are many edge cases where unusual text and formatting cause the parser to
generate incorrect HTML.
The following are the exceptions explicitly raised by the FAAPI functions. The exceptions deriving from ParsingError
are chosen depending on the content of the page. Because Fur Affinity doesn't use HTTP status codes besides 404, the
page is checked against a static list of known error messages/page titles in order to determine the specific error to be
used. If no match is found, then the ServerError
(if the page has the "Server Error" title) or the more
general NoticeMessage
exceptions are used instead. The actual error message parsed from the page is used as argument
for the exceptions, so that it can be analysed when caught.
DisallowedPath(Exception)
The path is not allowed by the robots.txt.Unauthorized(Exception)
The user is not logged-in.ParsingError(Exception)
An error occurred while parsing the page.NonePage(ParsingError)
The parsed page isNone
.NotFound(ParsingError)
The resource could not be found (general 404 page or non-existing submission, user, or journal).NoTitle(ParsingError)
The parsed paged is missing a title.DisabledAccount(ParsingError)
The resource belongs to a disabled account.ServerError(ParsingError)
The page contains a server error notice.NoticeMessage(ParsingError)
A notice of unknown type was found in the page.
When parsing some pages or converting HTML to BBCode, the Beautiful Soup
library may give some warnings, for example MarkupResemblesLocatorWarning
. These warnings are left enabled for
clarity, but can be disabled manually using the warnings.filterwarnings
function.
All contributions and suggestions are welcome!
If you have suggestions for fixes or improvements, you can open an issue with your idea, see #Issues for details.
If any problem is encountered during usage of the program, an issue can be opened on GitHub.
Issues can also be used to suggest improvements and features.
When opening an issue for a problem, please copy the error message and describe the operation in progress when the error occurred.